Search CORE

231 research outputs found

Modelling admixture to map genes underlying ethnic differences in phenotype

Author: Hoggart CJ
McKeigue PM
Shriver MD
Publication venue: Elsevier (Cell Press)
Publication date: 01/01/2002
Field of study

Fregene: Simulation of realistic sequence-level data in populations and ascertained samples

Author: AG Clark
B Peng
BS Weir
CJ Hoggart
CJ Hoggart
Clive J Hoggart
David J Balding
DJ Balding
E Setakis
I Tachmazidou
J Hey
JL Davies
John C Whittaker
Marc Chadeau-Hyam
Maria De Iorio
MJ Minichiello
Paul F O'Reilly
S Schaffner
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets.Results: We report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection.Conclusion: FREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended

Crossref

LSHTM Research Online

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

King's Research Portal

University of Melbourne Institutional Repository

Bayesian Variable Selection to identify QTL affecting a simulated quantitative trait

Author: Anouk Schurink
BL Fridley
CJ Hoggart
D Habier
G Sahana
Henri CM Heuven
JM Elsen
LLG Janss
Luc LG Janss
RE Kass
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background Recent developments in genetic technology and methodology enable accurate detection of QTL and estimation of breeding values, even in individuals without phenotypes. The QTL-MAS workshop offers the opportunity to test different methods to perform a genome-wide association study on simulated data with a QTL structure that is unknown beforehand. The simulated data contained 3,220 individuals: 20 sires and 200 dams with 3,000 offspring. All individuals were genotyped, though only 2,000 offspring were phenotyped for a quantitative trait. QTL affecting the simulated quantitative trait were identified and breeding values of individuals without phenotypes were estimated using Bayesian Variable Selection, a multi-locus SNP model in association studies. Results Estimated heritability of the simulated quantitative trait was 0.30 (SD = 0.02). Mean posterior probability of SNP modelled having a large effect ( pˆi) was 0.0066 (95%HPDR: 0.0014-0.0132). Mean posterior probability of variance of second distribution was 0.409 (95%HPDR: 0.286-0.589). The genome-wide association analysis resulted in 14 significant and 43 putative SNP, comprising 7 significant QTL on chromosome 1, 2 and 3 and putative QTL on all chromosomes. Assigning single or multiple QTL to significant SNP was not obvious, especially for SNP in the same region that were more or less in LD. Correlation between the simulated and estimated breeding values of 1,000 offspring without phenotypes was 0.91. Conclusions Bayesian Variable Selection using thousands of SNP was successfully applied to genome-wide association analysis of a simulated dataset with unknown QTL structure. Simulated QTL with Mendelian inheritance were accurately identified, while imprinted and epistatic QTL were only putatively detected. The correlation between simulated and estimated breeding values of offspring without phenotypes was high

Crossref

Springer - Publisher Connector

PubMed Central

Wageningen University & Research Publications

Utrecht University Repository

Significance testing in ridge regression for genetic data.

Author: AE Hoerl
AE Hoerl
AM Halawa
CI Amos
CJ Hoggart
CJ Hoggart
CJ Hoggart
D Altman
E Riboli
E Vago
Erika Cule
G Golub
H Zou
IE Frank
J Lawless
J McKay
JC Whittaker
JY Tzeng
K Ayers
M Chadeau-Hyam
M Park
M Zucknick
Maria De Iorio
N Malo
N Meinshausen
P Armitage
P Yang
Paolo Vineis
R Development Core Team
R Tibshirani
RJ Hung
SL Cessie
T Hastie
T Hsiang
T Truong
TA Manolio
The 1000 Genomes Project Consortium
WTCCC
Y Sun
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

Published versio

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Spiral - Imperial College Digital Repository

Probability that a chromosome is lost without trace under the neutral Wright-Fisher model with recombination

Author: B Padhukasahasram
Badri K. Padhukasahasram
CJ Hoggart
J Wakeley
JFC Kingman
M Kimmel
M Kimura
R Durret
RA Fisher
RC Griffiths
RD Hernandez
RR Hudson
S Wright
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/06/2010
Field of study

I describe an analytical approximation for calculating the short-term probability of loss of a chromosome under the neutral Wright-Fisher model with recombination. I also present an upper and lower bound for this probability. Exact analytical calculation of this quantity is difficult and computationally expensive because the number of different ways in which a chromosome can be lost, grows very large in the presence of recombination. Simulations indicate that the probabilities obtained using my approximate formula are always comparable to the true expectations provided that the number of generations remains small. These results are useful in the context of an algorithm that we recently developed for simulating Wright-Fisher populations forward in time. C++ programs that can efficiently calculate these formulas are available on request.Comment: Additional Information, Padhukasahasram et al. 2008, Genetics, FORWSIM algorith

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

LASSO model selection with post-processing for a genome-wide association study data set

Author: A Dasgupta
Allan J Motyer
Chris McKendry
CJ Hoggart
J Fan
J Friedman
J Wu
J Yang
N Meinshausen
R Tibshirani
S Cho
Sally Galbraith
Susan R Wilson
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Model selection procedures for simultaneous analysis of all single-nucleotide polymorphisms in genome-wide association studies are most suitable for making full use of the data for a complex disease study. In this paper we consider a penalized regression using the LASSO procedure and show that post-processing of the penalized-regression results with subsequent stepwise selection may lead to improved identification of causal single-nucleotide polymorphisms

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Australian National University

University of Melbourne Institutional Repository

Simultaneous analysis of all single-nucleotide polymorphisms in genome-wide association study of rheumatoid arthritis

Author: AB Begovich
CI Amos
CJ Hoggart
DJ Balding
George Mathew
Hongyan Xu
J Hoh
JC Barrett
M Chang
N Rish
RM Plenge
RM Plenge
SM Sarasua
Varghese George
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

AWclust: point-and-click software for non-parametric population structure analysis

Author: A Bowcock
AL Price
B Devlin
B Devlin
B Devlin
B Wu
CJ Hoggart
CJ Hoggart
D Falush
ES Lander
G Guillot
H Tang
J Corander
J Corander
J Marchini
J Mountain
JK Pritchard
Joshua D Starmer
KJ Dawson
L Excoffer
LL Cavalli-Sforza
M Bauchet
M Freedman
M Shriver
N Liu
N Patterson
N Rosenberg
NJ Risch
O Lao
PM McKeigue
R Kaeuffer
R Tibshirani
S Purcell
S Purcell
SL Guthery
X Gao
Xiaoyi Gao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Population structure analysis is important to genetic association studies and evolutionary investigations. Parametric approaches, e.g. STRUCTURE and L-POP, usually assume Hardy-Weinberg equilibrium (HWE) and linkage equilibrium among loci in sample population individuals. However, the assumptions may not hold and allele frequency estimation may not be accurate in some data sets. The improved version of STRUCTURE (version 2.1) can incorporate linkage information among loci but is still sensitive to high background linkage disequilibrium. Nowadays, large-scale single nucleotide polymorphisms (SNPs) are becoming popular in genetic studies. Therefore, it is imperative to have software that makes full use of these genetic data to generate inference even when model assumptions do not hold or allele frequency estimation suffers from high variation. Results We have developed point-and-click software for non-parametric population structure analysis distributed as an R package. The software takes advantage of the large number of SNPs available to categorize individuals into ethnically similar clusters and it does not require assumptions about population models. Nor does it estimate allele frequencies. Moreover, this software can also infer the optimal number of populations. Conclusion Our software tool employs non-parametric approaches to assign individuals to clusters using SNPs. It provides efficient computation and an intuitive way for researchers to explore ethnic relationships among individuals. It can be complementary to parametric approaches in population structure analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies

Author: A Genkin
AC Morrison
B Servin
CC Holmes
CJ Hoggart
Clive J. Hoggart
David J. Balding
DJ Lunn
EI George
EI George
I Gradshteyn
IP Gorlov
JE Griffin
John C. Whittaker
L Breiman
M Bazaraa
M West
Maria De Iorio
MR Osborne
N Patterson
NR Wray
PD Sasieni
Peter M. Visscher
PJ Brown
R Sladek
R Tibshirani
S Zhang
SF Schaffner
TH Meuwissen
TJ Mitchel
Y Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation

CiteSeerX

Public Library of Science (PLOS)

Crossref

LSHTM Research Online

Directory of Open Access Journals

PubMed Central

UCL Discovery

University of Melbourne Institutional Repository

Effect of population stratification analysis on false-positive rates for common and rare variants

Author: AL Price
AP Morris
B Devlin
Brad G Kurowski
C Dering
CJ Hoggart
DC Thomas
ET Cirulli
Hua He
LA Almasy
Lili Ding
Lisa J Martin
NA Rosenberg
NJ Schork
P Paschou
SP Dickson
Tesfaye M Baye
TM Baye
Xue Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Principal components analysis (PCA) has been successfully used to correct for population stratification in genome-wide association studies of common variants. However, rare variants also have a role in common disease etiology. Whether PCA successfully controls population stratification for rare variants has not been addressed. Thus we evaluate the effect of population stratification analysis on false-positive rates for common and rare variants at the single-nucleotide polymorphism (SNP) and gene level. We use the simulation data from Genetic Analysis Workshop 17 and compare false-positive rates with and without PCA at the SNP and gene level. We found that SNPs’ minor allele frequency (MAF) influenced the ability of PCA to effectively control false discovery. Specifically, PCA reduced false-positive rates more effectively in common SNPs (MAF > 0.05) than in rare SNPs (MAF < 0.01). Furthermore, at the gene level, although false-positive rates were reduced, power to detect true associations was also reduced using PCA. Taken together, these results suggest that sequence-level data should be interpreted with caution, because extremely rare SNPs may exhibit sporadic association that is not controlled using PCA

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central